BUPT Systems in the SIGHAN Bakeoff 2007
نویسندگان
چکیده
Chinese Word Segmentation(WS), Name Entity Recognition(NER) and Part-OfSpeech(POS) are three important Chinese Corpus annotation tasks. With the great improvement in these annotations on some corpus, now, the robustness, a capability of keeping good performances for a system by automatically fitting the different corpus and standards, become a focal problem. This paper introduces the work on robustness of WS and POS annotation systems from Beijing University of Posts and Telecommunications(BUPT), and two NER systems. The WS system combines a basic WS tagger with an adaptor used to fit a specific standard given. POS taggers are built for different standards under a two step frame, both steps use ME but with incremental features. A multiple knowledge source system and a less knowledge Conditional Random Field (CRF) based systems are used for NER. Experiments show that our WS and POS systems are robust.
منابع مشابه
NEU Systems in SIGHAN Bakeoff 2012
This paper describes the methods used for the parsing the Sinica Treebank for the bakeoff task of SigHan 2012. Based on the statistics of the training data and the experimental results, we show that the major difficulties in parsing the Sinica Treebank comes from both the data sparse problem caused by the fine-grained annotation and the tagging ambiguity.
متن کاملDescription of the HKU Chinese Word Segmentation System for Sighan Bakeoff 2005
In this paper, we describe in brief our system for the Second International Chinese Word Segmentation Bakeoff sponsored by the ACL-SIGHAN. We participated in all tracks at the bakeoff. The evaluation results show our system can achieve an F measure of 0.9400.967 for different testing corpora.
متن کاملThe CIPS-SIGHAN CLP 2012 ChineseWord Segmentation onMicroBlog Corpora Bakeoff
The CIPS-SIGHAN CLP 2012 Chinese Word Segmentation on MicroBlog Corpora Bakeoff was held in the autumn of 2012. This bake-off task of Chinese word segmentation is focused on the performance of Chinese word segmentation algorithms on MicroBlog corpora. 17 groups submitted 20 results, among which the best system has all the P, R and F values near 95%, and the average values of the 17 systems are ...
متن کاملBMM-Based Chinese Word Segmentor with Word Support Model for the SIGHAN Bakeoff 2006
This paper describes a Chinese word segmentor (CWS) for the third International Chinese Language Processing Bakeoff (SIGHAN Bakeoff 2006). We participate in the word segmentation task at the Microsoft Research (MSR) closed testing track. Our CWS is based on backward maximum matching with word support model (WSM) and contextual-based Chinese unknown word identification. From the scored results a...
متن کاملIntroduction to CKIP Chinese Spelling Check System for SIGHAN Bakeoff 2013 Evaluation
In order to accomplish the tasks of identifying incorrect characters and error correction, we developed two error detection systems with different dictionaries. First system, called CKIP-WS, adopted the CKIP word segmentation system which based on CKIP dictionary as its core detection procedure; another system, called G1-WS, used Google 1T uni-gram data to extract pairs of potential error word ...
متن کامل